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Abstract 

Background: The endothelial PAS domain protein 1 (EPASl) activates genes that are involved in erythropoiesis and 
angiogenesis, thus favoring a better delivery of oxygen to the tissues and is a plausible candidate to influence 
athletic performance. Using innovative statistical methods we compared genotype distributions and interactions of 
EPASl SNPs rsl867785, rsl 1689011, rs895436, rs4035887 and rsl867782 between sprint/power athletes (n = 338), 
endurance athletes (n = 254), and controls (603) in Polish and Russian samples. We also examined the association 
between these SNPs and the athletes' competition level ('elite' and 'sub-elite' level). Genotyping was performed by 
either Real-Time PGR or by Single-Base Extension (SBE) method. 

Results: In the pooled cohort of Polish and Russian athletes, 1) rsl 867785 was associated with sprint/power athletic 
status; the AA genotype in rsl 867785 was underrepresented in the sprint/power athletes, 2) rsl 168901 1 was also 
associated with sprint/power athletic status; the TT genotype in rsl 168901 1 was underrepresented sprint/power 
athletes, and 3) the interaction between rsl 867785, rsl 168901 1, and rs4035887 was associated with sprint/power 
athletic performance; the combinations of the AA genotype in rs4035887 with either the AG or GG genotypes in 
rsl 867785, or with the CT or CC genotypes in rsl 168901 1, were underrepresented in two cohorts of sprint/power 
athletes. 

Conclusions: Based on the unique statistical model rsl 867785/rsl 168901 1 are strong predictors of sprint/power 
athletic status, and the interaction between rsl 867785, rsl 168901 1, and rs4035887 might contribute to success in 
sprint/power athletic performance. 
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Background 

Maximal oxygen uptake (V02max) refers to the highest 
rate at which oxygen can be consumed by the body dur- 
ing intense exercise [1], and is, among other factors, an 
important predictor of eUte endurance performance [2]. 
Studies have shown that the changes in V02max follow- 
ing exercise training vary markedly between individuals 
and that ~ 50% of the variance can be explained by 
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genetic factors [3]. Over the last two decades, many Sin- 
gle Nucleotide Polymorphisms (SNPs) have been sug- 
gested to influence elite performance and the variability 
in V02max increase following exercise training [4]. While 
most of these SNPs were discovered using the candidate 
gene approach [5], a more comprehensive, genome-wide 
linkage approach has identified a genomic region on 
chromosome 2 that is associated with the V02max training 
response. The endothelial PAS domain protein 1 {EPASl) 
gene was one of the four genes in this region responsible 
for this linkage [6]. 

EPASl is a transcription factor playing a key role in 
the Hypoxia Inducible Factor (HIF) pathway in blood, 
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which is responsible for activating gene expression in re- 
sponse to hypoxia [7]. In normal oxygen conditions, EPASl 
is quickly degraded in the cytoplasm. However, when 
oxygen levels drop, EPASl becomes stabilized, translocates 
to the nucleus and activates genes that are involved in 
erythropoiesis (e.g. erythropoietin), and angiogenesis 
(e.g., vascular endothelial growth factor), thus favoring 
a better delivery of oxygen to the tissues [8]. Delivery 
of oxygen to skeletal muscles during endurance exercise is 
viewed a factor limiting V02max [!]• Therefore, as a hypoxia 
detector and as an activator of improved oxygen deliv- 
ery to the active tissues, EPASl is a plausible candidate 
to influence endurance performance. 

EPASl SNPs have been previously associated with 
blood parameters, such as alterations in erythropoietin, 
hemoglobin and hematocrit [9-11], that are important 
for success in athletic performance. For instance, Tibetans 
with the TT genotype in registered SNP (rs) 11689011 had 
lower hemoglobin concentration compared with their TC 
counterparts [9]. However, to date, only one study has 
examined a possible link between SNPs within EPASl 
and elite athletic performance [12]. This study looked 
at SNPs and haplotypes within EPASl in elite Australian 
athletes, stratified to two groups participating in middle- 
distance (from 50 s to 10 min, n = 242), and long-distance 
(from ~2 to 10 h, n = 151) events. These groups were 
compared to a non-athletic control group. The T allele in 
rsll689011 and the G allele in rsl867785, two SNPs lo- 
cated in the first intron of EPASl, were overrepresented in 
the group of endurance athletes compared with controls 
[12]. Furthermore, in the same study, two haplotypes in- 
volving rsl867785, rsll689011, rs895436 and rs4035887 
were associated with elite endurance performance. While 
haplotype G (A-T-G-G) was overrepresented in elite 
endurance athletes, haplotype F (G-C-C-G) was underrep- 
resented in elite endurance athletes compared to controls 
[12]. Despite these positive findings, and the strong 
biological rational behind investigating EPASl in relation 
to elite endurance performance, this is the only genetic 
association study showing that EPASl SNPs impact 
performance, and the sprinters consisting primarily of 
100-400 m track runners and sprint cyclists were excluded 
firom the analysis. Replication studies are therefore needed 
to confirm this association, particularly in different popula- 
tions, and with a larger sample size [13]. 

Therefore, the aim of this study was to compare 
genotype distributions and interactions of the EPASl 
SNPs rsl867785, rsl 1689011, rs895436, rs4035887 and 
rsl867782 between sprint/power athletes, endurance 
athletes, and controls in Polish and Russian cohorts. 
We also examined the association between the EPASl 
SNPs and athletic status according to the athletes' level 
of competition ('elite' and 'sub-elite' level). In light of the re- 
lationship previously observed between endurance-related 



phenotypes and SNPs in the EPASl, we hypothesized 
that EPASl SNPs would be associated with elite endurance 
performance compared to controls and sprint/power 
athletes. We did not have specific directional hypoth- 
eses for rs895436, rs4035887 and rsl867782 as these 
SNPs have not previously been associated with any per- 
formance and/or endurance-related phenotypes. We did 
not have any directional hypothesis for rsll689011 ei- 
ther, due to the conflicting results reported for this SNP 
(the T allele in rsl 1689011 was associated with endurance 
athletic status in elite Australians on one hand, and with 
lower hemoglobin concentrations in Tibetans, on the 
other hand) [9,12]. However, in line with the findings of 
Henderson et al. [12] we did expect the rsl867785 G allele 
to be associated with elite endurance performance. Finally, 
we did not have any directional hypotheses for any of the 
SNPs in relation to sprint/power performance. 

Methods 

The study was approved by the Pomeranian Medical 
University Ethics Committee, Poland, and the Ural State 
University of Physical Culture, Russia, and written in- 
formed consent was obtained from each participant. 
The study complied with the guidelines set out in the 
Declaration of Helsinki and the ethics policy of the 
Szczecin University [14]. 

Participants 

The athletes and controls were all European Caucasians. 
The athletes were categorized as either endurance athletes 
or sprint/power athletes as determined by the distance, 
duration and energy requirements of their event/sport. All 
athletes were ranked in the top 10 nationally in their sport 
discipline and grouped as being either 'elite-level' or 'sub- 
elite' based on their best personal performance. Those in 
the elite group had participated in international competi- 
tions such as World and European Championships, and/ 
or Olympic Games, whereas those in the sub-elite group 
had participated in national competitions only. Details on 
the number of participants in the elite and sub-elite group 
are presented in Table 1. 

Polish sample 

The sample comprised 198 Polish athletes (all men; mean 
age±SD, 28 ±4.4 y), including 92 elite and sub-elite 
sprint/power athletes and 106 elite and sub-elite endur- 
ance athletes, as well as 428 healthy, unrelated, sedentary 
controls (all male students of the University of Szczecin; 
mean age ± SD, 20.8 ± 1.2 y). 

Russian sample 

The Russian sample comprised 394 athletes (287 men and 
107 women; mean age ± SD, 27.8 ± 9.7 y), including 246 
elite and sub-elite sprint/power athletes and 148 elite and 
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Table 1 Athletes' description 

Polish athletes Russian athletes 

(n = 196) (n = 394) 

Elite Sub-elite Elite Sub-elite 

(n = 122) (n = 74) (n = 131) (n = 263) 



sub-elite endurance athletes, as well as 175 healthy un- 
related sedentary controls (104 men and 44 women, all 
students or employees of the Ural State University of 
Physical Culture; mean age ± SD, 30.2 ± 10.7 y). The de- 
scription of the Polish and the Russian athletes according 
to their event/ sport is summarized in Table 1. 

Genotyping 
Polish sample 

Genomic DNA was isolated from buccal epithelium 
using GenElute Mammalian Genomic DNA Miniprep 



Kit (Sigma, Hamburg, Germany) according to the manufac- 
turer's instructions. AH genetic analyses were performed at 
the Molecular Biology and Biotechnology Center, Faculty of 
Biology, University of Szczecin. All samples were genotyped 
in duplicate using allelic discrimination assays with Taqman" 
probes (Applied Biosystems, Carlsbad, California, USA) 
on a CFX96 Touch™ Real-Time PCR Detection System 
(Bio-Rad, Hercules, California, USA). To discriminate 
EPASl rsl867782, rsl867785, rsll689011, rs895436 
and rs4035887 alleles, TaqMan" Pre-Designed SNP 

Genotyping Assays were used (assay IDs: C 11639978_1_, 

C__11639984_10, C 2148918_10, C 2148915_10, 

C 2162989_10, respectively), including appropriate 

primers and fluorescentiy labeled (FAM and VIC) MGB™ 
probes to detect the alleles. Genotypes were assigned using 
all of the data from the study simultaneously. 

Russian sample 

Genomic DNA was isolated from buccal epithelium or 
peripheral blood, during the years 2011-2013, using the 
Diatom" DNA Prep kit (Cat. # D 1025, IsoGene Lab Ltd, 
Russia). Genotyping of five selected SNPs was performed 
by Single-Base Extension (SBE) method. The sequence 
surrounding each SNP was obtained from the Genome 
Reference Consortium Human genome buUd 37 assembly 
from the Ensembl Project [15]. The Primer3web software 
V. 4.0.0 [16] was used for designing the PCR primers. PCR 
product range was 109-173 bp. SBE primers to detect 
rs895436, rsl 1689011, rsl867782, and rsl867785 were 
designed to anneal on the positive strand immediately 
adjacent to the single nucleotide variation sites. SBE pri- 
mer for detection of rs4035887 was designed to anneal 
on the negative strand. To avoid any non-specific 
amplification and extension products, all primers were 
BLASTed against human genome reference sequence. 
Sets of preselected PCR primer pairs and SBE primers 
were screened for potential cross-reactivity by using 
AutoDimer software. 

Multiplex PCR was performed in a volume of 15 i^L 
containing 1 x PCR buffer, 1.0 mM MgCl2, 0.2 mM 
dNTPs, 0.7 |iM of each primer (5 pairs), 1 unit SmarTaq 
DNA polymerase (Dialat Ltd, Russia) and 5 ng of tem- 
plate DNA. Thermal cycler conditions were: 95°C for 
30 s, 30 cycles of 95°C for 45 s, 60°C for 45 s, 72''C for 
60 sec and finally 10 min at 72°C in GeneAmp" PCR 
System 9700 (Applied Biosystems). Multiplex PCR prod- 
ucts were checked for quality and yield by running 3 \xl in 
2% agarose-TBE gels. 5 |iL of PCR products were cleaned 
with 1 unit of FastAP Thermosensitive Alkaline Phosphatase 
(TAP) and 10 units of Exonuclease I (both enzymes from 
Fermentas). Multiplex SBE reaction was performed by using 
SNaPshot" Multiplex Kit (Applied Biosystems) in 5 |iL 
final volume, including 2.5 |iL of SnaPshot Multiplex 
Ready Reaction Mix, 1.0 i^l pooled SBE primers and 
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1.5 1^1 of cleaned PCR product (the PCR sequences, the 
SBE primers and their final concentration can be received 
from the authors by request). The cycling conditions were 
96°C 10 s, 50°C 5 s and 60°C 30 s, during 25 cycles in 
GeneAmp" PCR System 9700 (Applied Biosystems). To 
remove the unincorporated ddNTPs, the final product 
was incubated with 1 unit of TAP (Fermentas). SnaPshot 
products with GeneScan™ - 120 LIZ™ Size Standard (AB) 
were diluted in Hi-Di™ Formamide (AB), denatured and 
separated using an ABI PRISM 310 Genetic Analyzer (AB) 
with a 47 cm length capillary and POP-4'" polymer (AB). 
The SnaPshot" Primer Focus" Kit (AB) was used to analyze 
individual SBE primers for their approximate sizing loca- 
tions prior to performing the multiplex SBE reaction. Final 
data were analyzed using the GeneMapper" Software v. 4.1 
from Applied Biosystems following the software manual. 

K562 DNA High Molecular Weight from Promega 
Corp. (USA) served as positive control sample. Genetic 
profile of K562 DNA was following: rs895436 - G/G, 
rsll689011 - C/C, rs4035887 - T/T (for the negative 
strand), rsl867785 - G/G, rsl867782 - C/C. 

Each of the five PCR products (for the five different SNPs) 
was formed from five different individuals, and was se- 
quenced in separate reaction using BigDye* Terminator 
v3.1 Cycle Sequencing Kit (Applied Biosystems) with 
full coincidence of expected and observed sequences. 
Sequencing was performed in a second laboratory 
(Gordiz Ltd. Laboratory, Moscow, Russia), according 
to latest recommendations [17]. 

Genotyping reliability across two laboratories 

Genotyping was performed in duplicate in the same 
Laboratory for accuracy. Two independent investigators 
have called the genotyping score in each laboratory-100% 
of the genotypes could be called. For the purpose of re- 
sults reliability across two laboratories in two different 
countries (Russia and Poland), different DNA samples 
(one for each SNP, positive or negative controls) were 
shipped from Russia to Poland and were genotyped by 
TaqMan assays. The results of the genotyping were in 
100% agreement across the two laboratories. 

Statistical analysis 

The genotype frequencies of all individual SNPs are 
presented in Additional file 1. Before looking at both 
the individual effects of the five EPASl SNPs and their 
interactions, we selected the best genetic model for 
each SNP. Then, SNP main effects, as well as SNP-SNP 
interactions, were investigated using Multivariate Adaptive 
Regression Splines (MARS), a nonparametric regression 
method [18] that has been successfully applied for detecting 
SNP-SNP interaction in several studies [19-22]. Finally, the 
odds ratios (OR) of being either a sprint/power athlete or 
an endurance athlete were calculated for each significant 



SNP and significant interacting pairs of SNPs using the best 
genetic model for each SNP. DetaOs on the steps that have 
been taken in the statistical analysis are shown in Figure 1. 

Hardy-Weinberg equilibrium (HWE) 

analysis was used to confirm if the control group, in 
each of the two samples, met HWE expectations. HWE 
was tested separately for each SNP. 

Selection of the best genetic model for each SNP 

Three inheritance models (dominant, recessive and additive 
model) were assessed in the pooled cohorts of sprint/power 
and endurance athletes (Polish and Russian) using the 
Wald test in logistic regression models, and the best 
model was selected based on the minimum p-value for each 
SNP. The athletic status was divided into two categories 
and encoded as a dummy variable: athlete (encoded as 1) 
and control (encoded as 0). To adjust for multiple 
comparisons, the false discovery rate was controlled 
using the Benjamini and Hochberg correction [23]. For 
consistency, the same genetic model was applied separately 
in the Russian and Polish groups. All of the following 
steps were performed once when comparing sprint/power 
athletes with controls, and once again when comparing 
endurance athletes with controls. 

Analysis of SNP main effects and SNP-SNP interactions 

Variable importance ranking for SNPs with strong Linkage 
Disequilibrium (LD) has been shown to be biased in previ- 
ous studies [24,25]. Therefore, LD in the 5 SNPs was exam- 
ined using r [26], and the pairs of SNPs displaying a strong 
LD (r^ > 0.8) were identified (Additional file 2). Only 
rsll689011 and rsl867785 were in strong LD (r^ = 0.96). 
Consequendy, two MARS models, all corrected for sex, 
were developed: one including rsll689011, rs895436, 
rs4035887 and rsl867782, and another independent MARS 
model including rsl867785, rs895436, rs4035887 and 
rsl867782. For simplicity, we have only reported the 
MARS model excluding rsll689011 in the Results section. 

In MARS, the maximum number of basis functions 
was set at 100, and the maximum degree of interaction 
was set at 2. As MARS does not provide variable sig- 
nificance using p-values, each covariate selected by 
MARS was used as input into a logistic regression 
model to determine its significance. All non-significant 
covariates (p-value > 0.05) were excluded. To confirm 
the significance of the covariates identified by the lo- 
gistic regression, and to rank their importance, we used 
the Bootstrap Inclusion Fraction (BIF) criterion [27]. We 
obtained 10000 MARS models using 10000 random boot- 
strap samples with replacement from the original data set. 
Then, we calculated the proportion of times that a sig- 
nificant variable appeared in the 10000 MARS models 
and called this number the BIF. A variable which is 
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Figure 1 Tlie five steps that were followed in the statistical analysis. MARS: Multivariate Adaptive Regression Splines, BIF: Bootstrap Inclusion 
Fraction. Firstly, we selected the best genetic model for each SNP by testing three inheritance models (dominant, recessive and additive model) 
for each SNP in the entire cohort of sprint/power athletes. Secondly, MARS was used to detect SNP main effects and SNP-SNP interactions 
(rs 1867785 and rsl 168901 1 were used in two independent MARS models because of their strong linkage disequilibrium). Thirdly, the covariates 
selected by MARS were input into a logistic regression model to determine their significance, and all covariates with p-value > 0.05 were 
excluded. Fourthly, to validate the selected covariates, we repeated steps 2 and 3 on 10000 random samples with replacement from the origina 
dataset and calculated the how many times the selected covariates were significant in the 10000 random samples (BIF). All covariates with a 
BIF < 50% were excluded. Fifthly, we calculated the odds ratio of the genotype combinations for each selected covariate to give a clear 
biological interpretation. 



approximately uncorrelated with others, and is only 
significant at the chosen nominal a level in the MARS 
model, is selected in about 50% of bootstrap samples. As 
the p-value diminishes, the BIF tends toward 100%. Thus, 
we only included covariates with a BIF > 50%. 

For each significant covariate, the odds ratio (OR) of 
being either a sprint/power or an endurance athlete, 
depending on the genotype, was calculated. The 95% 
confidence intervals (CI) were calculated by uncondi- 
tional maximum likelihood estimation. 



Results 

In the pooled cohort of Russian and Polish controls, 
genotype distributions for each of the five SNPs was in 
agreement with HWE (p-value > 0.05). 

Endurance athletes vs. controls 

In the Russian sample, the MARS model excluding 
rsll689011 yielded only one significant covariate, with 
a BIF > 50%. An interaction between sex and rsl867785 
was observed: the GA and GG genotypes in rsl867785 
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were underrepresented in women endurance athletes 
(OR = 0.39, Table 2). 

In the Polish sample, no covariate was selected in the 
MARS model excluding rsl 1689011. When combining the 
Russian and the Polish groups, no covariate was selected in 
the MARS model excluding rsl 1689011. 

Using the MARS model excluding rsl867785, the results 
were almost identical to those of the MARS model exclud- 
ing rsl 1689011 (Additional ffle 3: Table S2). However, in 
the Polish sample one significant covariate was retained 
with a BIF>50%. The TT genotype in rsll689011 was 
underrepresented in the cohort of endurance athletes 
(OR = 0.49, Additional file 3: Table S2), especially in the 
elite-level cohort compared to their sub-elite counterparts 
(OR = 0.31, 95% confidence interval: 0.11-0.87, Table 3). 

Sprint/power athletes vs. controls 

In the Russian sample, the MARS model excluding 
rsll689011 yielded two significant covariates (p-value < 0.05 
in logistic regression); only one covariate, however, had 
a BIF > 50% (Table 2). The AA genotype in rsl867785 
was underrepresented in sprint/power athletes (OR = 0.47, 
Figure 2), especially in the elite sprint/power athletes com- 
pared to their sub-elite counterparts (OR = 0.35, 95% confi- 
dence interval: 0.16-1.79, Table 3). In the Polish sample, no 
covariates were selected in the MARS model (Table 2). 

In the Russian and Polish samples combined, the MARS 
model excluding rsl 1689011 yielded three significant co- 
variates, two had a BIF > 50% (Table 2). The trend already 
observed for rsl867785 in the Russian sprint/power group 
was even stronger when combined with the Polish group 



(BIF = 90.1%); the AA genotype in rsl867785 was under- 
represented in sprint/power athletes (OR = 0.53, Figure 2), 
especially in eUte sprint/power athletes compared to 
their sub-elite counterparts (OR = 0.41, 95% confidence 
interval: 0.22-0.75). Furthermore, an interaction between 
rsl867785 and rs4035887 was found; the combination of 
the AA genotype in rs4035887 and the GA or GG geno- 
type in rsl867785 was underrepresented in sprint/power 
athletes (OR = 0.61, Table 2 and Figure 3). We note that the 
genotype distribution of individual SNPs does not provide 
information regarding the genotype distribution of their 
combinations. For example, SNPa and SNPg are two SNPs 
with alleles A/a and B/b, respectively. In this example allele 
A is advantageous to performance, and allele B is also ad- 
vantageous to performance but only when combined with 
allele a. From this pattern, one would expect allele A to be 
overrepresented in athletes, but the A + B combination to 
be underrepresented in athletes. 

Using the MARS model excluding rsl867785, the re- 
sults were almost identical to those of the MARS model 
excluding rsl 1689011 (Additional file 3: Table S2). How- 
ever, the interaction between rsl 1689011 and rs4035887 
in the Russian and Polish sample combined was not 
significant (BIF = 49.5%). 

Discussion 

We have examined the genotype distribution and SNP-SNP 
interaction of five SNPs in the first intron of the EPASl 
gene in European sprint/power and endurance athletes. We 
initially hypothesised that these SNPs would be associated 
with endurance athletic status. However, contrary to our 



Table 2 Covariates identified in the MARS model excluding rsl 1689011 







Covariate 


P-value^ 


BIF^ 


Odds Ratio 




Endurance athletes vs. controls 


Russians 
Polish 

Russians + Polish 


rs1867785*sex 


0.00022 


61.7 


Other combinations 
GA or GG in women 


1 (ref) 

0.39 (0.24-0.65) 


Sprint/power athletes vs. controls 


Russians 


rs4035887 


0.0072 


43.6 


GA or GG 
AA 


1 (ref) 

0.54 (0.34-0.88) 






rsl 867785 


0.0017 


78.3 


GA or GG 
AA 


1 (ref) 

0.47 (0.25-0.84) 




Polish 














Russians + Polish 


rsl 867785 


0.00016 


90.1 


GA + GG 
AA 


1 (ref) 

0.53 (0.35-0.80) 






rs4035887*rs 1867785 


0.00016 


52.6 


Other combinations 
AA at rs4035887 and 
GA or GG at rsl 867785 


1 (ref) 

0.61 (0.45-0.85) 



V-value obtained by logistic regression. 

^Bootstrap Inclusion Fraction calculated after running 10000 MARS nnodels on 10000 bootstrap sannples. A BIF of 90.1 indicates that the covariate of interest was 
selected in 90.1% of the MARS models. 
*denotes an interaction. 



Table 3 Genotype frequencies of the three Single Nucleotide Polymorphisms (SNPs) significantly associated with athletic performance 



SNP 


Major/minor 


Model 


Genotypes 


Russians (Males + Females) 






Polish (Males) 












allele 






Controls 
(n = 175) 


Endurance 
athletes (n 


= 148) 


Sprint/power 
athletes (n = 246) 


Controls 
(n = 428) 


Endurance 
athletes (n 


= 106) 


Sprint/power 
athletes (n = 92) 


rs 1 1 Doyu 1 1 


T /r 


Recessive 


\ L or LL 


1 39 (79.4%) 


1 1 9 (80.4%) 




220 (894%) 




DJD [OZ.J /O) 


OA con {^QA\ 




82 (89.1%) 














Elite 


1 8 (72%) 


Elite 


97 (91.5%) 




til Le 


0 1 ^i'j.ovOJ 


Elite 


53 (91.4%) 












Sub-elite 


101 (82.1%) 


Sub-elite 


123 (87.9%) 




Sub-e ite 


DC (DC AOA) 


Sub-elite 


29 (85.3%) 








TT 
1 1 


36 (20.6%) 


29 (19.6%) 




26 (10.6%) 




/j [I / .J /Oj 


1 n CO /1 0A\ 




10 (10.9%)) 














Elite 


7 (28%) 


Elite 


9 (8.5%) 




til Le 




Elite 


5 (8.6%)) 












Sub-elite 


22 (17.9%) 


Sub-elite 


17 (12.1%) 




Sub-e ite 


D \ 1 4.070^ 


Sub-elite 


5 (14.7%) 




G/A 


P)nmi nsnt 

1 1 1 III ICJI I L 




130 (74.3%) 


1 1 9 (80.4%) 
Elite 


1 9 (76.0%) 


207 (84.1%) 
Elite 


88 (83.0%) 


?97 (69 4%) 


Elite 


41 (63.1%) 


62 (67.4%) 
Elite 


37 (63.8%) 












Sub-elite 


100 (81.3%) 


Sub-elite 


1 1 9 (85.0%) 




Sub-elite 


27 (65.9%) 


Sub-elite 


25 (73.5%) 








AA 


45 (25.7%) 


29 (19.6%) 




39 (15.9%) 




131 (30.6%) 


38 (34.9%) 




30 (32.6%) 














Elite 


6 (24.0%) 


Elite 


1 8 (1 7.0%) 




Elite 


24 (36.9%) 


Elite 


21 (36.2%) 












Sub-elite 


23 (18.7%) 


Sub-elite 


21 (15.0%) 




Sub-elite 


14 (31.4%) 


Sub-elite 


9 (26.5%) 


rs 1867785 


A/G 


Recessive 


GA or GG 


142 (81.1%) 


1 22 (82.4%) 




222 (90.2%) 




356 (83.2%) 


96 (90.6%) 




82 (89.1%) 














Elite 


1 8 (72%) 


Elite 


98 (92.5%) 




Elite 


61 (93.8%) 


Elite 


53 (91.4%)) 












Sub-elite 


1 04 (84.6%) 


Sub-elite 


1 24 (88.6%) 




Sub-elite 


35 (854%) 


Sub-elite 


29 (85.3%) 








AA 


33 (18.9%) 


25 (17.6%) 




24 (9.8%) 




72 (16.8%) 


10 (9.4%) 




10 (10.9%)) 














Elite 


7 (28%) 


Elite 


8 (7.5%) 




Elite 


4 (6.2%) 


Elite 


5 (8.6%)) 












Sub-elite 


19 (15.4%) 


Sub-elite 


16 (11.4%) 




Sub-elite 


6 (14.6%) 


Sub-elite 


5 (14.7%) 
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100-1 



80- 



60- 



40- 



20- 



18.9% 



81.1% 



** 



17.6% 



82.4% 



9.8% 



90.2% 



c 

n=175 



E 

n=148 



SIP 
n=246 



** 



16.8% 



83.2% 



9.4% 



90.6% 



10.9% 



89.1% 



C 
n=428 



E 

n=106 



SIP 
n=92 



17.4% 



82.6% 



14.2% 



85.8% 



10.1% 



89.9% 



D 


GG or AG 


□ 


AA 



c 

n=603 



E 

n=254 



SIP 
n=338 



Russians Polish Polish-^Russians 

Figure 2 Genotype distributions of rsl 867785 in the different groups. C: controls, E: endurance athletes, S/P: sprint/power athletes, ' 
p < 0.01 in linear regression, ***: p < 0.001 in linear regression. 



hypotheses our main findings were as follow: In the pooled 
cohort of Polish and Russian athletes, 1) rsl867785 was 
associated with sprint/power athletic status; the AA geno- 
type in rsl867785 was underrepresented in sprint/power 
athletes, 2) rsll689011 was also associated with sprint/ 



power athletic status; the TT genotype in rsl 1689011 
was underrepresented in sprint/power athletes, and 3) the 
interaction between rsl867785/rsl 1689011, and rs4035887 
was associated with sprint/power athletic performance; 
the combinations of the AA genotype in rs4035887 with 



100-1 



so- 



so- 



40- 



20- 



17.1% 



24% 



57.1% 



c 

n=175 



4 1% 



13.5% 



15.5% 



66.9% 



9.8% 



15.9% 



74.4% 



E 

n=148 



S/P 
n=246 



16.8% 



30.6% 



52.6% 



6.6% 



33% 



57.5% 




8.7% 



30.4% 



58.7% 



16.9% 



28.7% 



53.9% 



3.5°/( 



10.6% 



22.8% 



63% 



c 

n=603 



E 

n=254 



9.5% 



19.8% 



70.1% 



S(P 
n=338 



a GAorGGatrs1867785<GAorGGatrs403S887 

O GAorGGalrs1867785->AAatrs4035887 

O AA al rs1867785 ♦ GA or GG at rs4035887 

n AAa1rs1867785»AAatrs4035887 



Russians Polish Polish-^Russians 

Figure 3 Interaction between rs11689011, rs4035887 and athletic status in the different groups. C: controls, E: endurance athletes, S/P: 
sprint/power, ***: p < 0.001 in linear regression. 
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either the GA or GG genotypes in rsl867785, or with 
the CT or CC genotypes in rsl 1689011, were underrep- 
resented in two cohorts of sprint/power athletes. 

Compared with a previous study, the resuhs of the 
present study were unexpected. Henderson et al. [12] have 
reported that the G allele in rsl867785 and the T allele in 
rsll689011 were overrepresented in endurance athletes, 
whereas we have observed that the G allele in rsl867785 
vras underrepresented (and the A allele was overrepresented) 
in female Russian endurance athletes. Furthermore, in our 
cohort, the T allele in rsl 1689011 was overrepresented in 
female Russian endurance athletes, but underrepresented 
in male Polish endurance athletes. A possible explanation 
for the discrepancies between our study and the study 
by Henderson et al. [12] is that our results regarding 
rsl 1689011 and rsl867785 were sex-specific. Sex-specific 
effects of SNPs are common in a wide range of phenotypes 
such as waist-hip ratio [28], susceptibility to sport-related 
injury [29], and increased muscle strength in response to 
training [30]. We have also recently shown that PPARGCIA 
rs4697425 was associated with elite female, but not with 
male endurance running performance [31]. However, the 
Polish endurance sample only comprised males, while the 
Russian sample included both males and females. Interest- 
ingly, the cohort in the study by Henderson et al. included 
both males and females athletes, but sex was not investi- 
gated as a covariate in their analysis [12]. 

Looking at each SNP separately, we found that rsl867785 
and rsll689011 were associated with sprint/power per- 
formance. However, these SNPs had very similar geno- 
type distribution and are in strong LD (r = 0.964); the 
A allele in rsl867785 segregates with the T allele in 
rsll689011. Since previous studies have suggested that 
SNPs with strong LD cannot be considered in the same 
statistical model [24,25], and only one of these SNPs 
might influence sprint/power performance, we have created 
two MARS models and considered only one of the SNPs 
in each model. As expected, the two MARS models 
yielded almost identical results in all cohorts. However, 
only rsl 1689011 was associated with endurance athletic 
status in the Polish sample, while this was not the case for 
rsl867785. This might be explained by the small difference 
in genotype distribution in rsl 1689011 and rsl867785 in 
the Polish control sample. In addition, the BIF obtained for 
rsl 1689011, when comparing Polish endurance athletes 
with controls, was not strong enough (65.4%), indicating a 
correlation close to non-significance and sensitive to small 
changes in genotype frequencies. The relatively large sam- 
ple size generated from studying two cohorts of athletes in 
the present study further reinforces the confidence in the 
results of the present study. 

Indeed, in the present study two European Caucasian 
cohorts of athletes were grouped to explore the associ- 
ation between SNPs in EPASl and athletic performance. 



Our previous results indicated that combining two cohorts 
of Caucasian athletes, especially when they are closely- 
related, would be useful approach to detect an association 
between SNPs and athletic status [13]. WhOe previous stud- 
ies combined athletes from different ancestries [32,33], here 
we have studied athletes from closely-related European an- 
cestries (e.g., Polish and Russians). This is reinforced by the 
similarity in the genotype distribution in the control groups 
(no more than 5% differences between the Russian and 
Polish controls). Furthermore, studying two cohorts of 
athletes has increased the sample size (overall 338 sprint/ 
power athletes and 254 endurance athletes), which further 
strengthened our results and the likelihood that these 
specific EPASl SNPs show a genuine association with 
elite sprint/power performance. 

An additional novel finding in the present study is that 
the AA genotype in rsl867785, and the TT genotype in 
rsl 1689011, is even more underrepresented in elite sprint/ 
power athletes compared with their sub-elite counterparts. 
This has previously been demonstrated for the highly- 
studied ACTN3 R577X SNP, as the 577XX genotype 
was found in a lower frequency in elite sprint/power 
athletes compared to their national-level counterparts 
[34-36]. This observation indicates that while the EPASl 
SNPs are associated with the development of sprint/power 
ability, they might be even more important in the de- 
velopment of world-class sprint/power ability. This 
finding, along with all the other findings in the present 
study, was obtained using the Bootstrap Inclusion Fractions 
(BIF) statistical method, which as far as we are aware of, 
has never been used in sports genomics. 

The BIF analysis is a useful technique for investigating 
variations among selected models in samples drawn at 
random with replacement. Such samples mimic datasets 
that are structurally similar to that under study and that 
could plausibly have arisen instead [37]. Initially designed 
to test the stability of multivariable models, this non- 
parametric method allowed us to test whether the 
EPASl SNPs selected by MARS were sensitive to small 
changes in the data, and confirm that they were unlikely 
to be false positives. Also, this method allowed us to 
see the relative importance of the different variables; 
while rsl867785 and rsll689011 showed very strong 
main effects in sprint/power athletes, their interaction 
with rs4035887 was of smaller importance. 

We have also shown that several SNPs within EPASl are 
associated with endurance athletic status, in a sex-specific 
manner. The TT genotype in rsl 1689011 was underrepre- 
sented in the cohort of Polish endurance athletes. However, 
this association was demonstrated only when either the 
Polish or the Russian groups were analysed separately, and 
was abolished when the two cohorts were combined. We 
argue that we cannot be certain that these positive findings 
are not false positives, as they might be a limitation of the 
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smaller sample size. On the other hand, our positive 
findings regarding the association between the EPASl 
SNPs and sprint/power athletic status were found in the 
combined cohort of sprint/power athletes, and the BIFs 
that were calculated for rsl 1689011 and rsl867785 were 
extremely high (90.1% for rsl867785, and 93.6% for 
rsll689011). Furthermore, these associations were more 
pronounced when the athletes' level of competition was 
considered. 

This study is not without limitations. In case-control 
studies, the relative proportions of controls and cases 
impacts the sample size required to detect an association 
with a given power and significance level. In the present 
study, with similar genotype distributions in Russians 
and Polish, to detect an association with the same effect 
size, at the same power and significance level, we would 
require a larger sample size in the Russian population. 
We acknowledge that the difference in numbers of athletes 
and controls in our study might therefore contribute to our 
results and the lack of replication in both athlete groups. 
However, in any association study with elite athletes it is a 
challenge to increase the sample size due to the very low 
number of elite athletes available to study. 

Finally, in previous reports EPASl SNPs have demon- 
strated an association with performance-related blood pa- 
rameters (e.g., alterations in erythropoietin, hemoglobin 
and hematocrit) [9-11], and elite endurance performance 
[12], in humans. However, is has also been shown that 
EPASl deficient mice have greater oxidative stress and an 
impaired response to oxidative stress [38]. A reduction in 
hematocrit levels and a global decrease in peripheral blood 
counts have also been observed in EPASl-raAl mice [39]. 
Although no specific SNPs were tested in the mice model, 
these studies illustrate the potential importance of the 
EPASl gene in athletic-related phenotypes. 

Conclusion 

We found an association between EPASl rsl867785 and 
EPASl rsll689011 and sprint/power athletic status, and 
an interaction between rsl867785/11689011 and rs4035887 
and sprint/power athletic status in two cohorts of closely- 
related European athletes. Based on the statistical model 
used either rsl867785 or rsl 1689011 are related to sprint/ 
power athletic status. The association between rsl867785 
and sprint/power athletic status is in line with a previous 
study in Australian athletes [12]. Unlike the vast number of 
investigations into the genetics of endurance performance, 
the genetic influence on elite sprint/power performance has 
received limited attention, and only a few studies have char- 
acterized the associations between genetic variants and elite 
sprint/power performance. Most studies to date have re- 
cruited only one cohort of athletes and were therefore ham- 
pered by insufficient sample size. In the present study, we 
have combined two cohorts of athletes and used innovative 



statistical methods, which provide confidence in our results. 
Functional studies directly demonstrating cause and effect, 
or providing any proposed cellular or molecular mecha- 
nisms to explain the association, are needed to extend and 
validate these findings. 
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